Add remaining-buffer bound check in _array_of_documents_to_buffer#2872
Add remaining-buffer bound check in _array_of_documents_to_buffer#2872AgentGymLeader wants to merge 2 commits into
Conversation
In `_cbson_array_of_documents_to_buffer`, the per-element embedded-document length `value_length` (read from server-controlled raw-batch BSON bytes) was checked only for a lower bound (`value_length < BSON_MIN_SIZE`) before being passed to `pymongo_buffer_write(buffer, string + position, value_length)`, which calls `memcpy` from the source buffer. There was no upper-bound check that `value_length` fits within the remaining array bytes, so a crafted reply (firstBatch/nextBatch) with an oversized inner `value_length` causes an out-of-bounds read from the heap. The pure-Python twin `_get_object_size` in `bson/__init__.py` already validates this invariant; the C fast-path dropped it. This commit adds the missing guard immediately before the `pymongo_buffer_write` call, mirroring the adjacent error-handling style and the existing lower-bound block. The loop already guarantees `position < size` and `(size - position) >= BSON_MIN_SIZE` via the prior check, so the subtraction `size - position` is safe (no uint32 underflow). The bug is reachable from `Collection.find_raw_batches()` and `aggregate_raw_batches()` when the driver connects to a malicious or compromised server. It belongs to the same bug class as CVE-2024-5629 (OOB read in the bson C extension, fixed in 4.6.3) but affects a different function. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Hi team! Could a maintainer please authorize the Evergreen CI patch run when you get a chance? No rush at all — just wanted to flag it so the checks can get started. Thanks so much! 🙏 |
|
Hi @AgentGymLeader, Your new test already passes on master. Can you provide a reproduction script that shows this issue can actually occur? |
|
Hi @NoahStapp, thanks for taking a look! You're right that the new test passes on master — but I believe the issue is that the test verifies the added code path works, while the underlying heap over-read it prevents can still occur on unpatched builds. Here's a minimal script to illustrate the difference: import struct
import bson._cbson as c
# BSON array where one element declares value_length=50,
# but only 7 bytes remain in the input buffer.
outer_size = 18
data = struct.pack('<I', outer_size) # outer BSON size (4 bytes)
data += b'\x03' # type = embedded document
data += b'0\x00' # key "0"
data += struct.pack('<I', 50) # declared sub-doc length = 50 ← lie: only 7 bytes remain
data += b'\x00' * 6 # partial content (6 bytes)
data += b'\x00' # outer null terminator
try:
c._array_of_documents_to_buffer(data)
except Exception as e:
print(e)
# Without this PR: "bad object or element length"
# ↑ caught AFTER pymongo_buffer_write reads 50 bytes from a 7-byte window
# With this PR: "invalid array content"
# ↑ caught BEFORE the over-read, at the new bounds checkThe problem is that To confirm the over-read itself: building with Happy to add an ASan-annotated test or adjust the repro if that would help! |
|
Please adjust the test so that it correctly tests only the fix's new error rather than passing with the existing fallback error on master. |
ea14897 to
b432ee7
Compare
|
@NoahStapp good catch, thanks. Tightened it to |
Asserts an embedded-document length larger than the remaining array bytes raises InvalidBSON, and that a valid buffer still decodes.
b432ee7 to
c3e9de2
Compare
What
Adds a single upper-bound check on the per-element embedded-document length
(
value_length) in_cbson_array_of_documents_to_buffer(inbson/_cbsonmodule.c), immediately before thepymongo_buffer_write/memcpycall that copies the element into the output buffer.Why
_cbson_array_of_documents_to_bufferis the C fast-path used byCollection.find_raw_batches()andaggregate_raw_batches()to convert araw BSON array (from a server
firstBatch/nextBatch) into a flat streamof BSON documents.
The function reads a per-element length
value_lengthfrom the raw bytes andthen passes it directly to
pymongo_buffer_write(buffer, string + position, value_length). Before this patch it only checked the lower bound:There was no corresponding upper-bound check that
value_lengthdoes notexceed the bytes remaining in the array document (
size - position). Acrafted raw-batch reply — from a malicious or compromised server — with an
inner
value_lengthlarger than the remaining buffer causespymongo_buffer_write→memcpyto read past the end of the heap allocation,constituting an out-of-bounds read.
The pure-Python counterpart
_get_object_sizeinbson/__init__.pyalreadyenforces this invariant. The C fast-path dropped it.
The loop already guarantees
position < sizeand(size - position) >= BSON_MIN_SIZEvia the prior guard (line ~3221), so thesubtraction
size - positioncannot underflow (both variables areuint32_tand
position < sizeis assured).This belongs to the same bug class as CVE-2024-5629 (out-of-bounds read in
the bson C extension, fixed in 4.6.3) but affects a different function
(
_cbson_array_of_documents_to_buffervs the one patched in 4.6.3).Change scope
bson/_cbsonmodule.c